Goto

Collaborating Authors

 new training data


New Training Data Labeling System for Machine Learning Helps Developers

#artificialintelligence

Machine learning (ML) has become one of the most prominent forms of data analysis for everything from fraud detection to visual quality control. Yet the analytic results can often suffer from insufficiently labeled training data. A team of Georgia Tech researchers has created a system that allows users to more effectively label a training dataset with higher accuracy than current methods. "We are looking at the problem from a data management perspective," said School of Computer Science (SCS) Assistant Professor Xu Chu. "In contrast to a lot of ML research that tries to tackle the lack of sufficient training data from an ML algorithm design perspective, we aim at building a system that helps users effectively label a dataset."


The Ultimate Guide to Model Retraining - KDnuggets

#artificialintelligence

Machine learning models are trained by learning a mapping between a set of input features and an output target. Typically, this mapping is learned by optimizing some cost function to minimize prediction error. Once the optimal model is found, it's released out into the wild with the goal of generating accurate predictions on future unseen data. Depending on the problem, these new data examples may be generated from user interactions, scheduled processes, or requests from other software systems. Ideally, we hope that our models predict these future instances as accurately as the data used during the training process.



Incremental Learning for Semantic Segmentation of Large-Scale Remote Sensing Data

Tasar, Onur, Tarabalka, Yuliya, Alliez, Pierre

arXiv.org Machine Learning

In spite of remarkable success of the convolutional neural networks on semantic segmentation, they suffer from catastrophic forgetting: a significant performance drop for the already learned classes when new classes are added on the data, having no annotations for the old classes. We propose an incremental learning methodology, enabling to learn segmenting new classes without hindering dense labeling abilities for the previous classes, although the entire previous data are not accessible. The key points of the proposed approach are adapting the network to learn new as well as old classes on the new training data, and allowing it to remember the previously learned information for the old classes. For adaptation, we keep a frozen copy of the previously trained network, which is used as a memory for the updated network in absence of annotations for the former classes. The updated network minimizes a loss function, which balances the discrepancy between outputs for the previous classes from the memory and updated networks, and the mis-classification rate between outputs for the new classes from the updated network and the new ground-truth. For remembering, we either regularly feed samples from the stored, little fraction of the previous data or use the memory network, depending on whether the new data are collected from completely different geographic areas or from the same city. Our experimental results prove that it is possible to add new classes to the network, while maintaining its performance for the previous classes, despite the whole previous training data are not available.


Creating a Machine Learning Commons for Global Development

#artificialintelligence

Advances in sensor technology, cloud computing, and machine learning (ML) continue to converge to accelerate innovation in the field of remote sensing. However, fundamental tools and technologies still need to be developed to drive further breakthroughs and to ensure that the Global Development Community (GDC) reaps the same benefits that the commercial marketplace is experiencing. This process requires us to take a collaborative approach. Data collaborative innovation -- that is, a group of actors from different data domains working together toward common goals -- might hold the key to finding solutions for some of the global challenges that the world faces. That is why Radiant.Earth is investing in new technologies such as Cloud Optimized GeoTiffs, Spatial Temporal Asset Catalogues (STAC), and ML. Our approach to advance ML for global development begins with creating open libraries of labeled images and algorithms.


AI Can Help Cybersecurity--If It Can Fight Through the Hype

WIRED

Walking the enormous exhibition halls at the recent RSA security conference in San Francisco, you could have easily gotten the impression that digital defense was a solved problem. Amidst branded t-shirts and water bottles, each booth hawked software and hardware that promised impenetrable defenses and peace of mind. Artificial intelligence that, the sales pitch invariably goes, can instantly spot any malware on a network, guide incident response, and detect intrusions before they start. That rosy view of what AI can deliver isn't entirely wrong. But what next-generation techniques actually do is more muddled and incremental than marketers would want to admit.


What is the difference between Bagging and Boosting?

@machinelearnbot

Bagging and Boosting are similar in that they are both ensemble techniques, where a set of weak learners are combined to create a strong learner that obtains better performance than a single one. So, let's start from the beginning: Ensemble is a Machine Learning concept in which the idea is to train multiple models using the same learning algorithm. The ensembles take part in a bigger group of methods, called multiclassifiers, where a set of hundreds or thousands of learners with a common objective are fused together to solve the problem. The second group of multiclassifiers contain the hybrid methods. They use a set of learners too, but they can be trained using different learning techniques.


From Data to AI with the Machine Learning Canvas (Part I)

#artificialintelligence

Machine Learning systems are complex. At their core, they ingest data in a certain format, to build models that are able to predict the future. A famous example in the industry is identifying fragile customers, who may stop being customers within a certain number of days (the "churn" problem). These predictions only become valuable when they are used to inform or to automate decisions (e.g. which promotional offers to give to which customers, to make them stay). In many organizations, there is often a disconnect between the people who are able to build accurate predictive models, and those who know how to best serve the organization's objectives.


What is the difference between Bagging and Boosting? - Quantdare

#artificialintelligence

Bagging and Boosting are both ensemble methods in Machine Learning, but what is the key behind them? Bagging and Boosting are similar as they are both ensemble techniques, where a set of weak learners are combined to create a strong learner that obtains better performance than a single one. So, let's start from the beginning: Ensemble is a Machine Learning concept in which the idea is to train multiple models using the same learning algorithm. The ensembles take part in a bigger group of methods, called multiclassifiers where a set of hundreds or thousands of learners with a common objective are fused together to solve the problem. In the second group of multiclassifiers are the hybrid methods.